Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SD3.5 Large support #1719

Merged
merged 31 commits into from
Nov 1, 2024
Merged

SD3.5 Large support #1719

merged 31 commits into from
Nov 1, 2024

Conversation

kohya-ss
Copy link
Owner

SD3 Medium fine tuning works.

@kohya-ss
Copy link
Owner Author

SD3.5L training seems to work now. --disable_mmap_load_safetensors is recommended for Windows for faster safetensors loading.

Implemented block swap. With 30 blocks, training may be possible on an SD3.5L with 12GB VRAM.

Random drop out of Text Encoder embedding is not implemented yet. LoRA training is not implemented yet.

@kohya-ss kohya-ss mentioned this pull request Oct 24, 2024
25 tasks
@FurkanGozukara
Copy link

awesome can't wait to test it once matured

@kohya-ss
Copy link
Owner Author

Basic LoRA training (MMDiT only) may work now.

@kohya-ss
Copy link
Owner Author

Fix latent scaling/shifting.

@kohya-ss
Copy link
Owner Author

--clip_l_dropout_rate, --clip_g_dropout_rate and --t5_dropout_rate options have been added to sd3_train.py and sd3_train_network.py Each of these options will set the output of the corresponding Text Encoder to 0 with the specified probability. The default value is 0 (no dropout). The values ​​in the SAI technical report are 0.463 (46.3%), respectively, but the optimal values ​​seem to vary depending on the dataset.

Training for LoRA including the Text Encoder should now work correctly. Each of the dropout options will work for LoRA training, but the optimal value is unknown.

@bghira
Copy link

bghira commented Oct 27, 2024

L and G aren't dropped out separately. can't drop out all of clip either due to it being used for adanorm

@kohya-ss
Copy link
Owner Author

L and G aren't dropped out separately. can't drop out all of clip either due to it being used for adanorm

Thank you! Hmm, Appendix B.3 of their technical paper http://arxiv.org/pdf/2403.03206 states the following:

For unconditional diffusion guidance (Ho & Salimans, 2022), we set the outputs of each of the three text encoders independently to zero with a probability of 46.4%, such that we roughly train an unconditional model in 10% of all steps.

So I think it makes sense to drop it separately.

@FurkanGozukara
Copy link

@kohya-ss what is the purpose of dropout? so when SAI training they almost dropped out 50% of the times each one randomly?

so with dropout it is training only U-NET part is that meaning this?

@bghira
Copy link

bghira commented Oct 27, 2024

this kind of dropout is for pretraining, not for finetuning. for small datasets you will merely harm the model to drop out the text encoders so much. it will overcondition the uncond space.

@waomodder
Copy link

@kohya-ss
ここでご報告するのもあれですが、ちゃんとSD3.5largeのLoraが学習できてました!
image

image

image

FP8BASEを使用し、learning_rateは1e-4でDim64、alpha 32.0の設定で画像数は約8000枚で24000ステップ回しています。
Loss値は0.1付近で安定して収束していました。

@kohya-ss
Copy link
Owner Author

@kohya-ss what is the purpose of dropout? so when SAI training they almost dropped out 50% of the times each one randomly?

I don't know the details either, so please see the technical paper: https://arxiv.org/pdf/2403.03206.

@kohya-ss
Copy link
Owner Author

ここでご報告するのもあれですが、ちゃんとSD3.5largeのLoraが学習できてました!

無事に動作したようで幸いです。I'm glad it seems to have worked fine!

@waomodder
Copy link

@Bocchi-Chan2023
オプティマイザーはAdamW8bitを使用し、スケジューラーはcosineとなってます。

@mliand
Copy link

mliand commented Oct 29, 2024

INFO clip_l is not included in the checkpoint and clip_l_path is not provided sd3_utils.py:117│
INFO clip_g is not included in the checkpoint and clip_g_path is not provided sd3_utils.py:177│
INFO t5xxl is not included in the checkpoint and t5xxl_path is not provided sd3_utils.py:232│
The text encoder specified path parameter could not be found when training with the toml configuration file

@kohya-ss
Copy link
Owner Author

The text encoder specified path parameter could not be found when training with the toml configuration file

Please specify the respective weight files downloaded from HuggingFace for each option: clip_l, clip_g, t5xxl.

@kohya-ss
Copy link
Owner Author

split_qkv, train_block_indices and emb_dims network_args should work for LoRA training.

@kohya-ss
Copy link
Owner Author

Added SD3.5M support.

--pos_emb_random_crop_rate option is added for sd3_train.py and sd3_train_network.py. This option specifies the probability of random crop augmentation as described on the model card: https://huggingface.co/stabilityai/stable-diffusion-3.5-medium.

0 means no random crop, 1 means always. The default is 0.

@nephi-dev
Copy link

can't load the VAE, either 3.5 and 3, i'm getting an Missing key error

@nephi-dev
Copy link

can't load the VAE, either 3.5 and 3, i'm getting an Missing key error

actully, i've managed to extract the VAE directly from sd3 and it did work now, maybe I was using an wrong one

@kohya-ss
Copy link
Owner Author

can't load the VAE, either 3.5 and 3, i'm getting an Missing key error

SAI's SD3.5L/M checkpoint seems to have VAE built in. Please omit the --vae option.

@nephi-dev
Copy link

do you have an minimum params config to train an LoRA on sd35M? i've tried with some configs(that i've used for flux), yet no results at all

@kohya-ss
Copy link
Owner Author

do you have an minimum params config to train an LoRA on sd35M? i've tried with some configs(that i've used for flux), yet no results at all

README.md is updated.

@kohya-ss
Copy link
Owner Author

Supported SD3.5M multi-resolution learning. The feature has not yet been fully tested, so please let us know if you find any issues.

The specifications of the latent cache have changed, so please delete the previous cache files (it works but garbage will remain in the file).

The idea and code for positional embedding of SD3.5M was contributed by KBlueLeaf. Thank you KBlueLeaf!

@kohya-ss kohya-ss marked this pull request as ready for review November 1, 2024 10:05
@kohya-ss
Copy link
Owner Author

kohya-ss commented Nov 1, 2024

Fixed a memory leak when caching latents. This does not affect data that has already been cached.

Images were not being discarded after latent conversion.

@kohya-ss
Copy link
Owner Author

kohya-ss commented Nov 1, 2024

As the main functionality appears to be working, I'll proceed with merging this branch. Thank you for your significant contributions.

edit: This branch will be removed in the next few days.

@kohya-ss kohya-ss merged commit 264328d into sd3 Nov 1, 2024
2 checks passed
@FurkanGozukara
Copy link

FurkanGozukara commented Nov 3, 2024

--disable_mmap_load_safetensors

why is this not default? what is the reason not making it default? thank you

for example on runpod machines, model loading when training painfully slow, loading FLUX, can it help there too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants